This report explores a dataset containing price, certification, and 9 physical attributes for approximately 597,000 diamonds. The dataset was created by Solomon Messing in 2014, and can be found here.

Summary Statistics

Dimensions of dataset, 597,311 observations over 11 variables.

## [1] 597311     11

Summary of variables

##      carat           cut             color          clarity      
##  Min.   :0.200   Ideal :369346   G      :96053   SI1    :116468  
##  1st Qu.:0.500   V.Good:168550   F      :93452   VS2    :110997  
##  Median :0.900   Good  : 59415   E      :93374   SI2    :104104  
##  Mean   :1.072                   H      :86555   VS1    : 97677  
##  3rd Qu.:1.500                   D      :73563   VVS2   : 65480  
##  Max.   :9.250                   I      :70213   VVS1   : 54790  
##                                  (Other):84101   (Other): 47795  
##      table           depth               cert            price      
##  Min.   : 0.00   Min.   : 0.00   GIA       :463066   Min.   :  300  
##  1st Qu.:56.00   1st Qu.:61.00   IGI       : 43497   1st Qu.: 1220  
##  Median :58.00   Median :62.10   EGL       : 33770   Median : 3503  
##  Mean   :57.63   Mean   :61.06   EGL USA   : 16070   Mean   : 8753  
##  3rd Qu.:59.00   3rd Qu.:62.70   EGL Intl. : 11447   3rd Qu.:11174  
##  Max.   :75.90   Max.   :81.30   EGL ISRAEL: 11301   Max.   :99990  
##                                  (Other)   : 18160                  
##        x                y                z         
##  Min.   : 0.150   Min.   : 1.000   Min.   : 0.040  
##  1st Qu.: 4.740   1st Qu.: 4.970   1st Qu.: 3.120  
##  Median : 5.780   Median : 6.050   Median : 3.860  
##  Mean   : 5.993   Mean   : 6.201   Mean   : 4.035  
##  3rd Qu.: 6.970   3rd Qu.: 7.230   3rd Qu.: 4.610  
##  Max.   :13.890   Max.   :13.890   Max.   :13.180  
##  NA's   :1814     NA's   :1851     NA's   :2543

Univariate plots

Carat

Interestingly there are peaks that occur at for each integer value up to 7 carats, similarly visible peaks occur at the .5 carat values from 0.5-3.5 carat diamonds. This may be due to cultural stigma about purchasing a diamond under a certain weight, or buyers may prefer to purchase a diamond of lesser cut, color or clarity that meets or exceeds these carat values.

Carat summary statistics:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.200   0.500   0.900   1.072   1.500   9.250

Price

Transforming from a linear to log distribution of prices to better understand the shape of my data. There are two distinct peaks, one around $800, and a second peak around $12,500. It appears the market for diamonds is actually two separate markets, one for diamonds priced up to ~$12,500, and a second market for diamonds priced from $12,500 on up.

Price summary statistics:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     300    1220    3503    8753   11170   99990

Cut, color and clarity

Observe the cut quality of the diamonds is rightward skewed, with most diamonds having a cut quality of ‘ideal’.

Cut summary statistics:

##  Ideal V.Good   Good 
## 369346 168550  59415

Similarly, color quality (lower letter is better) is rightward skewed as well. It appears most consumers are satisfied with a diamond of color H or better.

Color summary statistics:

##     D     E     F     G     H     I     J     K     L 
## 73563 93374 93452 96053 86555 70213 48645 25807  9649

Clarity seems to have a threshold of SI2, relatively few diamonds for jewelry purposes are sold below this clarity. More than half of all diamonds sold have a clarity of at least VS2. Surprisingly more than 5% of all diamonds are considered internally Flawless (IF). Additionally more diamonds are classified IF than are classified I1 and I2 combined.

Clarity summary statistics:

##     IF   VVS1   VVS2    VS1    VS2    SI1    SI2     I1     I2     I3 
##  31156  54790  65480  97677 110997 116468 104104  14355   2284      0

Certification agency

GIA certifies the vast majority of the diamonds included in this dataset, far more than all other certification agencies combined. I wonder if the different certification agencies specialize in different types of diamonds. Which agency has the highest proportion of low-quality diamonds? Which agency has the highest median price? Do you get more diamond for your money from some agencies?

Certification summary statistics:

##        GIA        IGI        EGL    EGL USA  EGL Intl. EGL ISRAEL 
##     463066      43497      33770      16070      11447      11301 
##        HRD        AGS      OTHER 
##       9936       2958       5266

Depth

There are a 8066 diamonds with a depth of 0, and another 663 diamonds with a depth between 0 and 10 mm. Looking at the summary statistics below makes it seem these values may be a data entry error, perhaps the data is off by an order of magnitude?

I find it interesting >12% of all IGI certified diamonds fall into this potential error case. I’ll pay attention to this certification agency going forward to see if any other discrepancies arrise.

##      carat            cut           color         clarity    
##  Min.   :0.2000   Ideal :3984   H      :1320   SI2    :1494  
##  1st Qu.:0.4200   V.Good:3981   G      :1289   VS2    :1393  
##  Median :0.7000   Good  : 764   I      :1258   IF     :1240  
##  Mean   :0.9308                 F      :1143   SI1    :1234  
##  3rd Qu.:1.0100                 E      :1135   VS1    :1203  
##  Max.   :6.6900                 D      : 913   VVS2   :1013  
##                                 (Other):1671   (Other):1152  
##      table           depth                cert          price      
##  Min.   : 0.00   Min.   :0.00000   IGI      :5284   Min.   :  320  
##  1st Qu.: 0.00   1st Qu.:0.00000   GIA      :2099   1st Qu.: 1030  
##  Median :56.50   Median :0.00000   HRD      : 700   Median : 2018  
##  Mean   :36.84   Mean   :0.09293   OTHER    : 283   Mean   : 7060  
##  3rd Qu.:59.00   3rd Qu.:0.00000   EGL USA  : 269   3rd Qu.: 6020  
##  Max.   :69.00   Max.   :7.80000   EGL Intl.:  66   Max.   :99640  
##                                    (Other)  :  28                  
##        x                y                z         
##  Min.   : 0.690   Min.   : 1.000   Min.   : 1.000  
##  1st Qu.: 4.830   1st Qu.: 4.820   1st Qu.: 2.980  
##  Median : 5.620   Median : 5.610   Median : 3.530  
##  Mean   : 5.861   Mean   : 5.864   Mean   : 3.673  
##  3rd Qu.: 6.400   3rd Qu.: 6.450   3rd Qu.: 4.030  
##  Max.   :11.940   Max.   :12.040   Max.   :10.240  
##  NA's   :311      NA's   :313      NA's   :340

Here is a histogram excluding diamonds with a depth of < 10 mm.

Depth summary statistics (>10mm depth):

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   30.10   61.10   62.10   61.97   62.70   81.30

The sweet spot for diamond depth is betwen 60 mm and 65 mm.

Table

We see a similar issue as above with 2981 diamonds with a table value of 0%, while another 598 diamonds have a table value between 0% and 10%. Looking deeper at these data points reveal all other variables are populated with the exception of depth. Again IGI appears to be represented at a higher rate than would be expected.

Below we see the summary stats for diamonds with a table of < 10%:

##      carat            cut           color        clarity   
##  Min.   :0.2000   Ideal :1742   G      :682   SI2    :932  
##  1st Qu.:0.4100   V.Good:1415   H      :630   SI1    :706  
##  Median :0.7000   Good  : 422   F      :616   VS2    :592  
##  Mean   :0.9016                 E      :569   VS1    :531  
##  3rd Qu.:1.0200                 I      :439   VVS2   :285  
##  Max.   :8.0300                 D      :334   VVS1   :207  
##                                 (Other):309   (Other):326  
##      table            depth               cert          price      
##  Min.   :0.0000   Min.   : 0.000   GIA      :1722   Min.   :  320  
##  1st Qu.:0.0000   1st Qu.: 0.000   IGI      :1008   1st Qu.: 1050  
##  Median :0.0000   Median : 0.000   OTHER    : 338   Median : 2200  
##  Mean   :0.1284   Mean   : 5.607   EGL USA  : 304   Mean   : 6522  
##  3rd Qu.:0.0000   3rd Qu.: 0.000   HRD      : 129   3rd Qu.: 5686  
##  Max.   :6.3000   Max.   :75.200   EGL Intl.:  54   Max.   :97120  
##                                    (Other)  :  24                  
##        x                y                z        
##  Min.   : 0.690   Min.   : 1.000   Min.   :1.000  
##  1st Qu.: 4.770   1st Qu.: 4.770   1st Qu.:2.980  
##  Median : 5.600   Median : 5.610   Median :3.560  
##  Mean   : 5.773   Mean   : 5.807   Mean   :3.675  
##  3rd Qu.: 6.420   3rd Qu.: 6.470   3rd Qu.:4.040  
##  Max.   :12.760   Max.   :12.870   Max.   :9.870  
##  NA's   :315      NA's   :315      NA's   :318

Table summary statistics (>10% table):

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   13.00   56.00   58.00   57.97   59.00   75.90

The sweet spot for table is between 55-60%.

X, Y, and Z measurements

1,814 diamonds have a n/a value for x-axis measurement otherwise they seem to be mostly complete entries. Below you will find the summary statistics for these diamonds.

##      carat            cut           color        clarity        table     
##  Min.   :0.2300   Ideal :1026   F      :412   SI1    :360   Min.   : 0.0  
##  1st Qu.:0.4000   V.Good: 594   G      :353   VS2    :330   1st Qu.:56.0  
##  Median :0.7000   Good  : 194   E      :298   SI2    :298   Median :57.0  
##  Mean   :0.8577                 H      :297   VS1    :279   Mean   :47.9  
##  3rd Qu.:1.0200                 D      :184   VVS2   :221   3rd Qu.:58.0  
##  Max.   :5.5400                 I      :125   VVS1   :215   Max.   :68.0  
##                                 (Other):145   (Other):111                 
##      depth            cert          price             x       
##  Min.   : 0.00   GIA    :1453   Min.   :  502   Min.   : NA   
##  1st Qu.:60.40   IGI    : 110   1st Qu.: 1123   1st Qu.: NA   
##  Median :61.90   OTHER  :  95   Median : 2294   Median : NA   
##  Mean   :51.61   HRD    :  86   Mean   : 5428   Mean   :NaN   
##  3rd Qu.:62.80   EGL USA:  32   3rd Qu.: 5818   3rd Qu.: NA   
##  Max.   :70.20   EGL    :  31   Max.   :81180   Max.   : NA   
##                  (Other):   7                   NA's   :1814  
##        y                z        
##  Min.   : 3.210   Min.   :2.480  
##  1st Qu.: 4.935   1st Qu.:2.880  
##  Median : 5.585   Median :3.330  
##  Mean   : 6.072   Mean   :3.496  
##  3rd Qu.: 7.372   3rd Qu.:3.950  
##  Max.   :10.740   Max.   :7.460  
##  NA's   :1782     NA's   :869

1851 diamonds contain n/a values for y-axis measurement. Below are the summary statistics for these diamonds.

##      carat            cut           color        clarity   
##  Min.   :0.2300   Ideal :1048   F      :414   SI1    :367  
##  1st Qu.:0.4000   V.Good: 606   G      :353   VS2    :331  
##  Median :0.7000   Good  : 197   H      :323   SI2    :303  
##  Mean   :0.8688                 E      :296   VS1    :280  
##  3rd Qu.:1.0300                 D      :186   VVS2   :244  
##  Max.   :5.5400                 I      :127   VVS1   :212  
##                                 (Other):152   (Other):114  
##      table           depth            cert          price      
##  Min.   : 0.00   Min.   : 0.00   GIA    :1473   Min.   :  502  
##  1st Qu.:56.00   1st Qu.:60.40   IGI    : 115   1st Qu.: 1130  
##  Median :57.00   Median :61.90   OTHER  :  95   Median : 2284  
##  Mean   :48.08   Mean   :51.79   HRD    :  87   Mean   : 5536  
##  3rd Qu.:58.00   3rd Qu.:62.80   EGL USA:  38   3rd Qu.: 5868  
##  Max.   :68.00   Max.   :70.20   EGL    :  27   Max.   :81180  
##                                  (Other):  16                  
##        x                y              z        
##  Min.   : 4.610   Min.   : NA    Min.   :0.340  
##  1st Qu.: 5.120   1st Qu.: NA    1st Qu.:2.880  
##  Median : 5.970   Median : NA    Median :3.340  
##  Mean   : 6.588   Mean   :NaN    Mean   :3.514  
##  3rd Qu.: 7.580   3rd Qu.: NA    3rd Qu.:3.980  
##  Max.   :10.840   Max.   : NA    Max.   :7.510  
##  NA's   :1782     NA's   :1851   NA's   :891

2543 diamonds contain n/a values for the z-axis measurement. Below are the summary statistics for these diamonds.

##      carat           cut           color        clarity        table      
##  Min.   :0.200   Ideal :1687   G      :501   VS2    :407   Min.   : 0.00  
##  1st Qu.:0.550   V.Good: 672   H      :442   VS1    :397   1st Qu.:56.00  
##  Median :0.900   Good  : 184   F      :439   SI1    :384   Median :57.00  
##  Mean   :1.104                 E      :348   VVS1   :375   Mean   :50.47  
##  3rd Qu.:1.500                 I      :272   VVS2   :337   3rd Qu.:58.00  
##  Max.   :5.540                 D      :264   SI2    :298   Max.   :68.00  
##                                (Other):277   (Other):345                  
##      depth            cert          price             x         
##  Min.   : 0.00   GIA    :2255   Min.   :  350   Min.   : 2.000  
##  1st Qu.:60.70   OTHER  : 111   1st Qu.: 1990   1st Qu.: 5.530  
##  Median :62.10   IGI    :  64   Median : 4292   Median : 6.115  
##  Mean   :54.04   HRD    :  46   Mean   : 9693   Mean   : 6.401  
##  3rd Qu.:62.80   EGL    :  36   3rd Qu.:11882   3rd Qu.: 7.350  
##  Max.   :70.20   EGL USA:  18   Max.   :94849   Max.   :10.330  
##                  (Other):  13                   NA's   :869     
##        y                z       
##  Min.   : 3.100   Min.   : NA   
##  1st Qu.: 5.590   1st Qu.: NA   
##  Median : 6.170   Median : NA   
##  Mean   : 6.464   Mean   :NaN   
##  3rd Qu.: 7.410   3rd Qu.: NA   
##  Max.   :10.380   Max.   : NA   
##  NA's   :891      NA's   :2543

Cleaning the dataset

Given the size of this dataset, I feel comfortable dropping any diamonds with a x, y, z, table or depth value less than 10 (% or mm respectively) or n/a. I will be using this revised dataset for the remainder of this analysis. This leaves me with 585,808 diamonds to examine. I’ve saved the ‘dirty’ dataset of 12,223 diamonds to analyze in more depth below.

Summary statistics for this cleaned dataset are below:

##      carat           cut             color          clarity      
##  Min.   :0.200   Ideal :363083   G      :94104   SI1    :114673  
##  1st Qu.:0.500   V.Good:163616   E      :91705   VS2    :109006  
##  Median :0.900   Good  : 58389   F      :91665   SI2    :102130  
##  Mean   :1.074                   H      :84610   VS1    : 95922  
##  3rd Qu.:1.500                   D      :72289   VVS2   : 63986  
##  Max.   :9.250                   I      :68613   VVS1   : 53417  
##                                  (Other):82102   (Other): 45954  
##      table           depth               cert            price      
##  Min.   :13.00   Min.   :30.10   GIA       :458110   Min.   :  300  
##  1st Qu.:56.00   1st Qu.:61.10   IGI       : 37976   1st Qu.: 1220  
##  Median :58.00   Median :62.10   EGL       : 33722   Median : 3539  
##  Mean   :57.97   Mean   :61.97   EGL USA   : 15711   Mean   : 8776  
##  3rd Qu.:59.00   3rd Qu.:62.70   EGL Intl. : 11371   3rd Qu.:11242  
##  Max.   :75.90   Max.   :81.30   EGL ISRAEL: 11271   Max.   :99990  
##                                  (Other)   : 16927                  
##        x                y                z         
##  Min.   : 0.150   Min.   : 1.430   Min.   : 0.040  
##  1st Qu.: 4.740   1st Qu.: 4.980   1st Qu.: 3.120  
##  Median : 5.780   Median : 6.070   Median : 3.870  
##  Mean   : 5.993   Mean   : 6.205   Mean   : 4.041  
##  3rd Qu.: 6.970   3rd Qu.: 7.240   3rd Qu.: 4.620  
##  Max.   :13.890   Max.   :13.890   Max.   :13.180  
## 

Best vs. worst quality diamonds

Looking below at diamonds that have ideal cut, color of D, and clarity of IF.

This plot is hard to see much of what is going on, I’ve replotted it below using a base 10 log scale for price.

Here are the summary statistics for the best quality diamonds. The most facinating bit to me is the max size is 2.58 carat vs 9.25 carats for the heaviest stone in the complete dataset.

##      carat            cut           color         clarity    
##  Min.   :0.2000   Ideal :4639   D      :4639   IF     :4639  
##  1st Qu.:0.4400   V.Good:   0   E      :   0   VVS1   :   0  
##  Median :1.0200   Good  :   0   F      :   0   VVS2   :   0  
##  Mean   :0.9451                 G      :   0   VS1    :   0  
##  3rd Qu.:1.2900                 H      :   0   VS2    :   0  
##  Max.   :2.5800                 I      :   0   SI1    :   0  
##                                 (Other):   0   (Other):   0  
##      table           depth            cert          price      
##  Min.   :53.00   Min.   :56.20   GIA    :4372   Min.   :  435  
##  1st Qu.:56.00   1st Qu.:60.80   IGI    : 234   1st Qu.: 2165  
##  Median :57.00   Median :61.70   HRD    :  18   Median :19740  
##  Mean   :57.57   Mean   :61.46   AGS    :   4   Mean   :19557  
##  3rd Qu.:59.00   3rd Qu.:62.20   OTHER  :   4   3rd Qu.:29070  
##  Max.   :62.50   Max.   :63.80   EGL    :   3   Max.   :99458  
##                                  (Other):   4                  
##        x               y               z        
##  Min.   :2.290   Min.   :3.720   Min.   :1.500  
##  1st Qu.:4.700   1st Qu.:4.895   1st Qu.:3.100  
##  Median :6.430   Median :6.480   Median :4.000  
##  Mean   :5.972   Mean   :6.096   Mean   :3.862  
##  3rd Qu.:6.970   3rd Qu.:7.000   3rd Qu.:4.350  
##  Max.   :8.930   Max.   :8.930   Max.   :8.240  
## 

Now let’s check out the other end of the spectrum, what do the lowest quality diamonds look like?

I’m intrigued the median and mean carat values for the worst quality diamonds match up almost exactly with the best quality diamonds. Both right around 1 carat!

summary(subset(cleaned, cut =='Good' & color == 'L' & clarity == 'I2'))
##      carat           cut         color       clarity       table      
##  Min.   :0.520   Ideal : 0   L      :19   I2     :19   Min.   :55.00  
##  1st Qu.:0.650   V.Good: 0   D      : 0   IF     : 0   1st Qu.:58.50  
##  Median :0.970   Good  :19   E      : 0   VVS1   : 0   Median :60.00  
##  Mean   :1.043               F      : 0   VVS2   : 0   Mean   :60.47  
##  3rd Qu.:1.260               G      : 0   VS1    : 0   3rd Qu.:62.00  
##  Max.   :2.000               H      : 0   VS2    : 0   Max.   :67.00  
##                              (Other): 0   (Other): 0                  
##      depth               cert        price              x        
##  Min.   :59.30   GIA       :13   Min.   : 410.0   Min.   :5.110  
##  1st Qu.:61.00   EGL       : 6   1st Qu.: 726.5   1st Qu.:5.510  
##  Median :62.80   IGI       : 0   Median :1365.0   Median :6.170  
##  Mean   :63.09   EGL USA   : 0   Mean   :1855.9   Mean   :6.257  
##  3rd Qu.:64.90   EGL Intl. : 0   3rd Qu.:2233.0   3rd Qu.:6.785  
##  Max.   :67.40   EGL ISRAEL: 0   Max.   :5957.0   Max.   :8.000  
##                  (Other)   : 0                                   
##        y               z        
##  Min.   :5.050   Min.   :3.220  
##  1st Qu.:5.570   1st Qu.:3.440  
##  Median :6.110   Median :3.870  
##  Mean   :6.242   Mean   :3.942  
##  3rd Qu.:6.730   3rd Qu.:4.310  
##  Max.   :8.000   Max.   :4.970  
## 

Dataset

Structure of dataset

After cleaning my dataset of non-sensical and n/a values, 585,088 diamonds remain with 11 features (carat, cut, color, clarity, table, depth, certification agency, price, x, y, and z). The variables cut, color and clarity are ordered factor variables with the following levels:

(worst) ————> (best)

Cut: Good, Very Good, Ideal

Color: L, K, J, I, H, G, F, E, D

Clarity: I3, I2, I1, SI2, SI1, VS2, VS1, VVS2, VVS1, IF

other observations:

  • More diamonds are of Ideal cut than are of the other two cuts combined.

  • Median carat size is 0.900.

  • Most diamonds are color G or above.

  • 75% of all diamonds in my cleaned dataset are 1.5 carat or less.

  • Median price is $3,539 with a high value of $99,990.

Features of interest

The features that are most interesting as output values for a model are carat and price. I’m interested in diving into the differences between the different certification agencies. Do some agencies specialize in lesser-quality diamonds? Additionally I’m excited to look at the spike in the number of diamonds sold with weights at or above integer valued carats. I am intrigued to see if lesser cut, clarity or color diamonds are kept bigger to sell at or above these integer values.

Bivariate and Multivariate plots

Correlation matrix

##       price carat table depth     x     y     z
## price  1.00  0.86  0.03 -0.08  0.72  0.80  0.64
## carat  0.86  1.00  0.05 -0.05  0.86  0.96  0.79
## table  0.03  0.05  1.00 -0.47  0.04  0.06  0.03
## depth -0.08 -0.05 -0.47  1.00 -0.06 -0.09 -0.01
## x      0.72  0.86  0.04 -0.06  1.00  0.89  0.48
## y      0.80  0.96  0.06 -0.09  0.89  1.00  0.82
## z      0.64  0.79  0.03 -0.01  0.48  0.82  1.00

Price vs. Carat

The following plots examine the relationship between price and carat weight for diamonds. First we will look at the overall density plot. Note the large vertical streaks that occur at carat weights ending with .50 and .00. I will be referring to this preference for diamonds to exceed a ‘round’ carat weight as a vanity metric.

Price vs. Carat - by Cert. Agency

Let’s start by looking at price vs carat by certification agency to see if any agencies have specialties. GIA dominates the certification market in sheer nubers as well as price. EGL USA and IGI both bring less of a price premium on diamonds they certify.

Price vs. Carat - Color

Looking at the the same plot colored by the diamond’s color. It does appear a lot of 2 & 3 carat diamonds of lesser color were allowed to come to market relative to other weights.

Price vs. Carat - Cut

Now looking at the smae plot by cut quality, not many non-ideal cuts are allowed to come to market.

Price vs. Carat - Clarity

Similar to color, we observe more diamonds of lesser quality to come to market at 1, 1.5, 2 and 3 carat weights.

Quality vs price - Vanity weights

Now, I’m getting really intersted in the behaviour of diamond prices vs. quality at the vanity points. Below we can start to observe how prices change with differing color and/or clarity at the vanity points. Additionally notice the how few diamonds are offered in the low quality quadrant, perhaps diamonds with these ratings are used industrially instead of for jewelry?

Quality vs. Cut - Vanity weights

Looking at Diamond Quality and cut quality at the vanity points. There are thresholds easily visible (I1 clarity and K Color) demonstrating benchmarks to exceed for a diamond to be viable on the market.

Diamond quality and vanity

To dive deeper into the vanity idea, I’ve replotted all diamonds by color and clarity. This time the colors signify if the diamond belongs in the vanity group or not. We see top quality diamonds tend not to be vanity diamonds, instead the vanity diamonds tend to have a color of J or better, and a clarity of SI2-VS1.

Price vs Carat - Subset by Clarity

Observe the price jumps for vanity diamonds, these jumps are consistent across various clarity levels.

Price vs. Carat - Subset by Color

We see the same trend for a price hike for vanity diamonds across color levels also

Price vs. Color - Vanity weights by Cert. Agency

Checking out the color offerings for vanity weight diamonds by certification agency. Again we see GIA seems to certify the vast majority of high quality diamonds while others specialize in bringing lesser quality diamonds to market.

Price vs Carat - Subset by Cert Agency (Non-Vanity weights)

Now we are looking at these same plots but for the non-vanity weight diamonds. GIA diamonds tend to be more expensive as well as carry nearly all of the high- end market.

Examining the Vanity Price jump

Below we can examine the vanity price jump subset by cut, color, clarity and certification agency. We observe GIA commands the largest price premium, and has a stranglehold on the high-end market (by quality and price).

FInal Plots

Plot One

Description One

This plot highlights the price jump that occurs every 0.50 carat. This price jump is visible through all of the 3 C’s (clarity, cut and color) of a diamond.

Plot 2

Description 2

This plot makes clear two interesting trends in the data - the spike in quantity of diamonds sold at vanity points as well as the drop in quality for diamonds at these same weights. The diamonds have been binned into 0.1 carat buckets. The color, clarity and cut proportions were then plotted for each of these bins. Observe the spikes in the total # of diamonds sold at the 1.0, 1.5, 2.0 and 3.0 carat weights. Notice also the disproportionate percentage of low quality (holds true for color, clarity & cut) diamonds present at these 4 weights. We can see the saturation of the diamond market at ‘vanity’ weights with lower quality diamonds relative to the non-vanity weights.

Plot 3

Description 3

Diving in deeper to the relationship examined in plot 1, I decided to identify the price premium for a diamond that weighs at least 1 carat. These three box plots examine the price premium commanded by diamonds around the 1.00 carat vanity point. First looking at all diamonds, observe the median diamond exhibits a price premium of $1,491 once carat weight exceeds 1.00. Next, notice the price premium broken down by quality of diamond. I find fascinating how the median price of a vanity diamond exceeds the upper range of the corresponding box for the non-vain diamonds. This relationship holds for all diamonds in this weight range with the exception of those with a color rating of G.

These price premiums are surprisingly durable to changes in quality, and for the highest quality diamonds can nearly double the price of a corresponding diamond of similar size and weight. Looking at median prices, you have to drop 3 levels for either clarity or color to find a vain diamond of similar price.

Reflection

Starting from a data set of more than 597,000 diamonds across 11 variables from 2014, after cleaning up and removing incomplete entries I was left with just over 585,000 diamonds to work with. I began by exploring the individual variables, to get a feel for the shape of my data. Next I moved on to explore questions of interest as I examined what determines the price of a diamond. Eventually I dove deep into the price jump and quality decrease found in diamonds just above every 0.50 carat increment.

There is an observable trend in diamonds of lesser cuts and colors on the market weighing in at or above x.00 and x.50 carat weight. This trend holds across differing certification agencies and persists to 2.5 carat weights.

Also I found prices increase dramatically for ‘perfect’ (Internally Flawless clarity as well as D color) holding carat weight constant. Consumers will easily pay 3x as much for a IF/D diamond than they will for a SI2/J (threshold level below which the quantity available is significantly diminished).

I think it would be quite interesting to investigate the differences in Certification agencies diamond portfolios going forward. Especially if I were to get a location of origin for each diamond. This would open doors to identifying potential sources of fraud for diamonds coming out of war torn areas, I would like to see if certain agencies have a ‘blind eye’ policy towards location of origin.